Amino acid translation program for full-length cDNA sequences with frameshift errors.

نویسندگان

  • Y Fukunishi
  • Y Hayashizaki
چکیده

Here we present an amino acid translation program designed to suggest the position of experimental frameshift errors and predict amino acid sequences for full-length cDNA sequences having phred scores. Our program generates artificial insertions into artificial deletions from low-accuracy positions of the original sequence, thereby generating many candidate sequences. The validity of the most probable sequence (the likelihood that it represents the actual protein) is evaluated by using a score (V(a)) that is calculated in light of the Kozak consensus, preferred codon usage, and position of the initiation codon. To evaluate the software, we have used a database in which, out of 612 cDNA sequences, 524 (86%) carried 773 frameshift errors in the coding sequence. Our software detected and corrected 48% of the total frameshift errors in 62% of the total cDNA sequences with frameshift errors. The false positive rate of frameshift correction was 9%, and 91% of the suggested frameshifts were true.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phylogenetic relationships of Iranian Infectious Pancreatic Necrosis Virus (IPNV) based on deduced amino acid sequences of genome segment A and B cDNA

Infectious Pancreatic Necrosis Virus (IPNV) is the causal agent of a highly contagious disease that affects many species of fish and shellfish. This virus causes economically important diseases of farmed rainbow trout, Oncorhynchus mykiss, in Iran which is often associated with the transmission of pathogens from European resources. In this study, moribund rainbow trout fry were collected during...

متن کامل

Molecular sequence accuracy and the analysis of protein coding regions.

Molecular sequences, like all experimental data, have finite error rates. The impact of errors on the information content of molecular sequence data is dependent on the analytic paradigm used to interpret the data. We studied the impact of nucleic acid sequence errors on the ability to align predicted amino acid sequences with the sequences of related proteins. We found that with a simultaneous...

متن کامل

Nucleotide sequence of cDNA encoding for preprochymosin in native goat (Capra hircus) from Iran

Prochymosin is one of the most important aspartic proteinases used as a milk-clotting enzyme in cheese production. In the present investigation we report sequence of cDNA encoding goat ( Capra hircus ) preprochymosin and compare its nucleotide and deduced amino acid sequences with sequences of other ruminants preprochymosin. As bovine prochymosin, the caprine prochymosin cDNA encodes 365 amino ...

متن کامل

Alignments of DNA and protein sequences containing frameshift errors

Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rates and often generate artefactual insertions and deletions of bases (indels) which corrupt the translation of sequences and compromise the detection of protein homologies. The impact of these errors on the utility of molecular sequence data is dependent on ...

متن کامل

Phylogenetic relationships of Iranian Infectious Pancreatic Necrosis Virus (IPNV) based on deduced amino acid sequences of genome segment A and B cDNA

Infectious Pancreatic Necrosis Virus (IPNV) is the causal agent of a highly contagious disease that affects many species of fish and shellfish. This virus causes economically important diseases of farmed rainbow trout, Oncorhynchus mykiss, in Iran which is often associated with the transmission of pathogens from European resources. In this study, moribund rainbow trout fry were collected during...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Physiological genomics

دوره 5 2  شماره 

صفحات  -

تاریخ انتشار 2001